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Abstract. Analysis of data from an Affymetrix Latin Square spike-in experiment 
indicates that measured fluorescence intensities of features on an oligonucleotide 
microarray are related to spike-in RNA target concentrations via a hyperbolic response 
function, generally identified as a Langmuir adsorption isotherm. Furthermore the 
asymptotic signal at high spike-in concentrations is almost invariably lower for a 
mismatch feature than for its partner perfect match feature. We survey a number 
of theoretical adsorption models of hybridization at the microarray surface and find 
that in general they are unable to explain the differing saturation responses of perfect 
and mismatch features. On the other hand, we find that a simple and consistent 
explanation can be found in a model in which equilibrium hybridization followed by 
partial dissociation of duplexes during the post-hybridization washing phase. 
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1. Introduction 

Oligonucleotide microarrays are designed to enable the evaluation of simultaneous 
expression of large numbers of genes in prepared messenger RNA samples. Details 
of the technology and the design and manufacture of Affymetrix GeneChip arrays, the 
focus of this paper, can be found in the review of Nguyen et al.|2I] or at the Affymetrix 



website http://www.affymetrix.com/technology/index.affx The purpose of this 



paper is to examine physical models of hybridization of RNA at the microarray surface 
in the light of differing responses of perfect match and mismatch probes. 

In the manufacture of Affymetrix arrays, single strand DNA probes, 25 bases in 
length are synthesized base by base onto a quartz substrate using a photolithographic 
process. They are attached to the substrate via short covalently bonded linker molecules 
roughly 10 nanometres apart. A microarray chip surface is divided into some hundreds 
of thousands of regions called features, commonly 11 to 20 microns square, and with the 
single strand DNA probes within each feature being synthesized to a specific nucleotide 
sequence. 

A key step in the laboratory process of gene detection with microarrays is the 
hybridization of cRNA target molecules fractionated to lengths of typically 50 to 200 
bases onto the single strand DNA probes. The density of hybridized probe-target 
duplexes in each feature is detected via intensity measurements of fluorescent dye 
attached to the target cRNA molecules. Each gene or EST is represented by a set 
of 11 to 20 (dependent on the chip type) pairs of features using sequences of length 25 
selected for their predicted hybridization properties and specificity to the target gene. 
The first element of the pair, termed the perfect match (PM), is designed to be an 
exact match to the target sequence, while the second element, the mismatch (MM), is 
identical except for the middle (13th) base being replaced by its complement. 

A number of studies have demonstrated the appropriateness of Langmuir adsorption 
theory for understanding probe-target hybridization at the surface of microarrays. 
Experimental work includes that of Nelson et al.|2(]jj, Peterson et al.|22[ l23j and Dai 
et al.|HI. Analyses which have sought to match Langmuir adsorption isotherms with 
data from an Affymetrix spike- in experiment include those of Held et al. JH] , Hekstra et 
al.[T2j, Lemon et al. Burden et al. ■6j and Binder et al.|5]. 

The ultimate aim of such work is to establish a functional relationship between 
measured fluorescence intensities and underlying target concentration parameterized by 
known physical properties such as probe base sequences. If such a relationship could 
be established, it would offer the possibility of an absolute measure of RNA target 
concentration, as opposed to an arbitrarily defined 'expression measure'. Fundamental 
to establishing this relationship is a model which accurately describes the physics of the 
various steps involved in producing a set of intensity measurements from a given mRNA 
target concentration. The two steps we focus on in this paper are hybridization at 
the microarray surface and the subsequent washing step, designed to removed unbound 
target molecules. 



Adsorption models of oligonucleotide microarmys 



3 



A little recognized shortcoming of existing hybridization models based on Langmuir 
adsorption theory is their inability to explain the differing responses of PM and MM 
fluorescence intensity signals at saturation concentrations of RNA. That the asymptotic 
response of a MM feature at high PM-specific spike-in concentration should be less than 
that of the neighbouring PM feature is hardly news to an experimental biologist, and yet 
this observation is surprisingly difficult to reconcile with Langmuir adsorption theory 
(see Section Ej). This problem was discussed in the early experimental work of Forman 
et al.[n3, who serendipitously recognized the 'unexpected benefit' of the phenomenon 
of differential response between PM and MM, but failed to find a satisfactory physical 
explanation. It is stated in the manufacturer's web page that 'The reason for including 
a MM probe is to provide a value that comprises most of the background cross 
hybridization and stray signal affecting the PM probe. It also contains a portion 
of the true target signal. 'jlj Consequently, many researchers have come to view the 
MM signal as primarily an attempt to measure non-specific hybridization and other 
background signal, though in practice there are problems with using the MM signals 
for this purposeflS^. Since the MM signals are more than a measure of non-specific 
hybridization, we will concentrate in this paper on the view that MM features are 
primarily less responsive versions of the PM features, and seek to understand their 
differing responses at saturation. The difference between PM and MM probe signals 
can then be exploited as the result of a single, well controlled change in one of the many 
parameters influencing the complicated process of hybridization. From this perspective 
one can obtain powerful insights into the physics and chemistry of hybridization at the 
microarray surface. 

In Section |2l we review the Langmuir or hyperbolic isotherm and its relationship to 
a well known Affymetrix spike-in data set. Section El concentrates on an extension of the 
adsorption based hybridisation models of Hekstra et al. and Halperin et al. JT] which 
include the effects of non-specific hybridization, and which we show to be essentially 
equivalent to each other. This model is consistent with a hyperbolic response function, 
as observed in data from spike- in experiments. However, as we point out in Section |3J it 
is unable to explain the observed difference between PM and MM signals at saturation 
concentrations. Section El is a survey of a number of possible improvements to our 
starting model of hybridisation at the microarray surface, which seek to overcome this 
shortcoming. Many of these ideas have been canvassed in the literature, though in 
general they have not been rigorously examined in the light of the Hekstra/Halperin 
model. In general, we find no convincing way of explaining the PM/MM difference 
at saturation by reference only to the hybridisation step. In Section IHl we consider the 
post-hybdridisation washing step, and find this to be the most promising explanation for 
the PM/MM difference. In Section [71 we summarize our findings and draw conclusions. 
Many of the technical calculations are relegated to appendices. 
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2. The Langmuir isotherm model 

Langmuir adsorption theory is based on an assumption that there are two competing 
processes driving hybridization: adsorption, i.e. the binding of target molecules to 
immobilized probes to form duplexes, and desorption, i.e. the reverse process of duplexes 
dissociating into separate probe and target molecules 



Herein we shall always use the word 'probe' to indicate single strand DNA immobilised 
on the microarray, 'target' to indicate RNA in solution and 'duplex' to indicate a bound 
probe-target pair. Both the forward and reverse processes are determined by chemical 
rate constants which depend on a number of factors including activation energies and 
temperature. Adsorption models of microarrays often lead to a hyperbolic response 
function, or equilibrium Langmuir isotherm, relating RNA target concentration x to a 
measured equilibrium fluorescence intensity namely 



The isotherm is defined by three parameters: is the measured background intensity 
at zero target concentration, h is the saturation intensity above background at infinite 
target concentration, and K is the target concentration required to reach half saturation. 
The physical origins of these parameters will be discussed in detail below. 

In a previous paper we have carried out an extensive statistical analysis P of 
fits of the hyperbolic and other response functions to the PM probes in the publicly 
available data from the Affymetrix Human HG-U95A Latin Square spike-in experiment 
(jhttp : //www. affymetrix . com/support/technical/ sample_data/datasets . af f x). In 
this experiment genes (or, more precisely, RNA transcripts) were spiked in at cyclic 
permutations of the set of known concentrations, together with a background of cRNA 
extracted from human pancreas. The data consists of fluorescence intensity values from 
a set of 14 probesets corresponding to 14 separate genes, each containing 16 probe pairs. 
For each probeset a set of fluorescence intensity values was obtained for the 14 spiked-in 
concentrations (0, 0.25, 0.5, 1, 2, 4, 1024) pM. The experiment was replicated three 
times using microarray chips from different wafers. In common with previous analyses 
of this data set, our study concentrated on data from 12 of the 14 genes, omitting data 
from two defective genes. 

Fits of a number of functions to the fluorescence intensities were compared using a 
rigorous statistical analysis. The optimum model of those considered for this data set 
is summarised as follows: 

(i) Measured fluorescence values can be approximated by a Gamma distribution with 
mean given by Eq. Q and constant coefficient of variation, here ^ 0.17. 

(ii) The equilibrium isotherm Eq. Q tracks fold changes from both PM and MM probes 
over the range of spiked-in concentrations from < IpM to > lOOOpM. 



Probe + Target ^ Duplex. 





(2) 
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(iii) All three parameters |/o, h and K are probe sequence dependent (in contrast with 
the findings of ref. |I31)- 

(iv) MM features almost invariably saturate at a lower asymptotic intensity yo + b than 
their PM counterparts. 

Plots of fits of the hyperbolic response function to intensity data from the 16 PM and 
MM features corresponding to a typical one of the 12 genes considered is reproduced 
from ref. [HI in Fig. ^ A measure of the closeness of the fit is the unsealed deviation, 
defined by Eq. (8) of ref. jHj. This quantity is the analogue for a generalized linear 
model of the mean square error in a standard linear regression. For each of the 12 genes 
in question, the unsealed deviation per degree of freedom is much the same, the gene 
shown in Fig. 1.3 being somewhere near the middle of the range. Since the complete set 
of models considered in ref. was a set of nested models, we were able to use standard 
statistical tests based on accepted principles of balancing accuracy and parsimony to 
reject alternative functional forms for the fiuorescence intensity response function, in 
favour of the hyperbolic form of Eq. Q. The rejected response functions included a 
Sips isotherm|2S] and a function modelling non-equilibrium adsorption (see Eq. (fTH|l ). 
While details of the analysis were only reported for PM features in our earlier paper, 
we have subsequently also confirmed points (i) to (iii) for MM features (see Appendix 
A for comparison of hyperbolic and Sips isotherms). Point (iv) was confirmed by fits 
of the hyperbolic response function by Hekstra et al. (see Fig. 2A of Ref. ^2] and 
accompanying text) and our own calculations, and is apparent from Fig. ^ 

3. Physical models leading to the hyperbolic isotherm 

In what follows we define 'specific' to mean PM specific. All other hybridization will 
be referred to as 'non-specific'. Hekstra et al.jT2] have modelled hybridization at the 
microarray surface in the combined presence of a specific cRNA target species and 
a single, non-specific target species using classical chemical adsorption kinetics. The 
model gives a hyperbolic response function of the form Eq. Q and predicts values for 
the parameters yo, b and K in terms of chemical rate constants and physical properties 
of the microarray. It is straightforward to extend their results to any number of non- 
specific species jH]. 

The hyperbolic isotherm is equivalently derivable from statistical mechanics by 
considering the Gibbs distribution at constant chemical potential |14J. Halperin et al. [TT] 
have used this approach to study adsorption in microarray chips in the presence of non- 
specific hybridization. In order to establish a notation for subsequent sections, we 
rederive here the hyperbolic isotherm using the Halperin approach. We shall further 
augment the approach to include partial zippering of duplexes, that is, the idea is that 
a particular probe-target duplex can exist in a number of possible partially zipped-up 
configurations a = 1, 2, . . .(see, for example, ref. [HI). 

For a given feature on the microarray surface, whether PM or MM, let the 
concentration of target molecules specific to the PM feature of the matched pair be x, 
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Figure 1. Fits of Eq. to fluorescence intensity data for the 16 PM (black) 
and 16 MM (grey) features of the gene 37777_at probeset of the Affymetrix spike- 
in experiment. Concentrations (horizontal axes) are in picomolar and fluorescence 
intensities (vertical axes) are in the arbitrary units used in Affymetrix .eel files. The 
fit to MM probe No. 3 gave unphysical negative values to the parameters K and h and 
is not shown. 
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and the concentration of the non-specific species ihe Zi. Further, let 6^ be the fraction 
of a given feature covered by specific duplexes in partially zippered configuration a, 
and likewise (pia be the fraction covered by duplexes formed with non-specific target 
species i in configuration a. The fraction covered with unmatched single strand probes 
is therefore 1 — 6* — X^i ^j, where the total fraction of sites holding respectively specific 
and non-specific duplexes of the ith. species is 6' = Y^a^^a and (pi = J^a^Pia- The free 
energy per mole of probe sites at the microarray surface is 



7 



RT 



\a In (j)ia+ [l 



i,a 



pticK 



(3) 



where /Xpta' /^ptia ^^'^ A^p respectively reference state chemical potentials per mole 
of specific and non-specific probe-target duplexes in configuration a, and of unmatched 
probes :j: . i? is the gas constant and T the absolute temperature. The exchange chemical 
potentials of the various species of probe-target duplexes are 

^7 



^7 



RT 



RT 



ln^„ 



In. 



In 1 



\n\l-9 




~^ f'pta 



At equilibrium these exchange chemical potentials balance the chemical potentials of the 
corresponding target molecule species in solution. Assuming the bulk concentrations of 
target molecules are not appreciably affected by hybridization, these are given in terms 
of reference values /i^ and at reference concentrations xq and zoi of specific and 
non-specific target molecules by 

(4) 



X 



/it =fi1 + RT\n—, 

Xo 

l^ti = fJ'ti + RTln—. 



Matching exchange chemical potentials with target chemical potentials gives 



i?T In 



X 



Xq 



Zi 



RT In — 



RT 



RT 



In. 



ln(l-^-^, 



+ AG,, 



where we have defined the duplex binding free energies 



H-pta H-p n 1 



AGjQ, 



A^ptja h^p M'ti 



(5) 

and summing over 



Solving for the duplex coverage fractions 9a and (p, 
configuratons a, we obtain the isotherms 

x/Ks 

l + x/Ks + Y.^z,lK, ^> 
X Halperin et al.^J also include a term for the charge density dependent electrostatic free energy, 
which we discuss briefly in Section 15.31 
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where K^^ and K'"^ are effective equilibrium constants for specific and nonspecific 
hybridisations given by 

Ks' = E e-^^"/^^ = %^ E e-^«-/«^. (8) 

a a 

Introducing proportionahty constants 6s and bi for the specific and non-specific 
hybridizations and a physical optical background a, the measured fluorescence intensity 
is given by 

y{x) =a + bse + J2bi(f)i (9) 



where 



and 



^0 + ^^^' ^10) 
X + K 



yo = a + A, b = bs-A, K = K^B, (11) 



The presence of non-specific hybridization does not spoil the hyperbolic form of the 
Langmuir isotherm Eq. but does influence the parameters Uq, b and K. The 
purpose of Eqs. (jHI), (fTTj) and (fT^ is to relate the estimated isotherm parameters to 
the underlying physical parameters: a (the physical background value in the absence of 
any hybridization), 6s and bi (proportionality constants relating the incremental change 
in measured intensity to an incremental change in duplex fraction for the specific and 
non-specific hybridizations respectively), duplex binding energies AGq, and AGjq,, and 
a set of non-specific background target concentrations Zi. The parameters 6s and bi are 
a measure of the amount of fluorescent light emitted per hybridized target molecule. 
Fluorescent dye is bound only to the target molecules (in fact only to U and C bases), 
so 6s and 6j can only be functions of specific and non-specific target sequences, and not 
probe sequences. Eqs. (II (J j) to (jl!2|) are a generalisaton of Eq. (2) of Hekstra et al. [T^ 



4. Inconsistency of adsorption models with observed PM/MM saturation 
intensities 

The model given by Eqs. (jHI) to (fT^ inescapably leads to a conclusion that the PM 
and MM intensity measurements for a given probe pair must saturate at the same 
asymptotic intensity value, contradicting the observed fits to experimental data. This 
point has been inferred previously in regard to adsorption models ^U], but does not 
appear to be generally appreciated in the literature, with the exception of work by 
Peterson et al.jSSI- 

Consider two neighbouring features on a microarray, one PM and one MM, their 
probe sequences differing only by the middle base. Recall that, in this paper, we define 
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the word 'specific' to mean those target cRNA which are exact complements to the PM 
sequence, even when deahng with the MM feature. For our purposes, this definition will 
prove useful given that, for most probe pairs, the dominant part of the MM signal at high 
spike-in concentrations in the Affymetrix experiment appears to come from hybridization 
of spiked-in target RNAs complementary to the PM sequence. Parameters relating to 
the PM and MM features will be indicated by superscripts PM and MM respectively. 

Although the sums occurring in Eq. (fT2|) will be over the same set of non-specific 
targets for PM as for MM, one can expect A^^ ^ A^^ since in general Kf^ ^ Kf^. 
Considering the asymptotic intensities at high concentration, however, Eqs. (fTUI) and 
(fTT|l imply that, under the Hekstra model, the non-specific hybridization effects cancel 
out: 



An essential step in this argument is the claim that the parameters a and 6s do not differ 
between intensity measurements from a neighbouring PM/MM pair of features. For the 
physical background a this is clearly a reasonable assumption: physical properties of 
the chip in the absence of any hybridization, such as reflectance, are unlikely to vary 
significantly over a distance of a few microns. For the parameter 6g the argument is 
more subtle. From Eq. 0, &s is, up to a multiphcative constant, the expected number of 
biotin labels per hybridized specific target molecule. Importantly, 6g confers on Eq. Q 
no information about probe-target binding affinities, this information being contained 
in the coverage fraction 6. By our current definition of 'specific', target molecules 
contributing to the specific part of the signals of a given PM/MM pair of features are 
drawn from the same subset of molecules in the RNA solution, namely those containing a 
contiguous PM-specific subsequence of 25 bases. Hence 6g is the same for both members 
of a neighbouring PM/MM pair. The Hekstra or Halperin model formulated above then 
necessarily entails that y^^ + b^^ = yQ^ + b^^, in obvious contradiction with the values 
of yo, and b obtained by fitting the spike-in data. 

The source of the problem is that any model leading to the coverage fraction given 
by Eq. (jH)) entails that at sufficiently high specific target concentration, all probes form 
duplexes: as x ^ oo, 6' — > 1. That is, all probes in the feature are predicted to form 
duplexes if saturated with enough specific target, even in the case of the MM feature. 
A subtle point to note is that this is true irrespective of the bulk solution melting 
temperature of duplexes, which is defined as the temperature at which half the total 
number of single strand targets are free and half are bound as duplexes in bulk solution. 
This temperature can be calculated in terms of enthalpy and entropy by balancing 
forward and backward reaction rates under the constraints of stoicheiometry, namely: 
2[T] + [T.T] = constant, where [T] and [T.T] are bulk concentrations of single strand and 
duplex targets respectively §. However, this stoicheiometric constraint does not apply 
for the adsorption reaction at the microarray surface: because the solution target volume 

§ In Sec. 15. 51 we argue that, for the Affymetrix spike-in data set [T] « x, the spike-in concentration. 





,MM , tMM 



a + 6s, 
a + 6s- 



(13) 
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is effectively infinite, the target concentration is unchanged as hybridisation proceeds 
and the coverage fraction theta increases towards its finite equihbrium value 9 < 1 . 
Even above the bulk solution melting temperature, the forward reaction can be forced 
by setting the target concentration sufficiently high. The upshot is that, irrespective of 
temperature, Langmuir adsorption theory tells us that a feature will saturate at infinite 
target concentration. 

5. Other hybridisation effects 

Clearly the above model does not account for all possible effects during the complex 
process of hybridisaton. In this section we consider a number of other possible 
hybridisation effects, some of which have been proposed in the literature as putative 
explanations for the differing measured PM/MM saturation intensities. In general, we 
find none of these effects to be a strong candidate, and believe that the explanation of 
the PM/MM saturation difference is unlikely to lie with the hybridisation step. 

5.1. Sips isotherm 

The problem of differential PM/MM saturation was recognized in the context of a 
simple Langmuir model without non-specific hybridization by Peterson et al. [23], who 
explain their experimental data by invoking a Sips isotherm to explain a lower MM 
response curve at high target concentrations. The Sips isotherm |25j is an empirical 
response curve believed to correspond to an adsorption model in which chemical reaction 
rates are drawn from a pseudo-Gaussian distribution. Peterson et al.'s experimental 
results are indeed a good fit to the Sips isotherm, however their experiment differs 
from the conditions of the hybridization of Affymetrix chips in one important aspect, 
namely the hybridization temperature. The Peterson experiment was carried out at 
a hybridization temperature of 20°C, while Affymetrix microarrays are hybridized at 
45°C. Furthermore, Peterson et al. found that heating the hybridization buffer to 37°C 
and then cooling back to 20° C almost completely removed any difference in equilibrium 
saturation intensities between PM and MM probes. This appears to be the effect of a 
first order phase transition which sets in at a temperature well below the Affymetrix 
hybridisation temperature. We comment on the problem of determining the phase 
structure in Sec. 15.41 

To determine whether the hyperbolic or Sips isotherm is more appropriate for the 
Affymetrix spike-in data we have carried out a statistical analysis comparing the fits 
of the MM data to both isotherms. Our results, summarized in Appendix A, show 
that for the Affymetrix spike-in data the extra parameters involved in invoking the 
Sips isotherm are not significant, and that a hyperbolic response function adequately 
describes the data. We conclude that, at a hybridization temperature of 45°C, the more 
appropriate empirical fit to the spike-in data is Eq. (0), with y^^{oo) < y^^(oo). 
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5.2. N on- equilibrium hybridization 

In an earlier paper [H] we examined the possibility that hybridization had not reached 
equilibrium in the Affymetrix spike-in experiment. We considered the simple non- 
equilibrium one-step model without non-specific hybridization, namely, 
df) 

- = kfx{l-9)-h9, (14) 
dt 

where fcf and fcb are forward and backward chemical reaction rates. The solution 
corresponding to the initial condition 6{x, 0) = is 

e(x, t) = \l - e-(-+^s)fcftl ^ (15) 

x + Ks^ J 

where Kq, = kh/kf. A statistical analysis of the data showed that the extra degree of 

freedom distinguishing the non-equilibrium from the equilibrium solution (Eq. |2)) is not 

significant. That is, our finding was that the equilibrium solution is the more appropriate 

model. 

However, textbook descriptions of duplex formation (see, for instance, ref. [7j, 
pages 1215 to 1219) imply that hybridization is more accurately described as a two 
step process: a slow rate determining step in which an initial two or three base 
pairs form, followed by a fast 'zipping-up' step involving the remaining base pairs. 
Measured forward reaction rates for duplex formation may typically be of the order of 
10^ mol~^sec~^ j2H|; potentially translating to timescales of several hours at picomolar 
concentrations. In order to establish more rigorously that the hybridization had reached 
equilibrium in the spike-in experiment, we have considered in Appendix B a quasi- 
equilibrium hybridization model with two timescales. Chemical reaction rates leading 
to the initiation configuration with two or three base pairs formed are taken to be slow, 
while other reaction rates are assumed to equilibrate on short timescales. Again this 
model leads to non-equilibrium solutions taking the form of Eq. ()15j] . which differs from 
the hyperbolic form observed in the data. This confirms that our previous statistical 
analysis is appropriate even when a two step hybridization process is taken into account. 
We therefore believe that equilibrium thermodynamics to be the correct framework for 
studying hybridization for this dataset. 



5.3. Electrostatic surface potential 

Halperin et al. jllj include in the free energy Eq. Q a term 7ei for the charge 
density dependent electrostatic free energy. The effect of this term is to change the 
effective equilibrium constants Ks and Ki by a finite amount via the replacements 
AGa AGa + d'jci/dOa and AGia — * AGia + d%i/d(j)ia in Eq. (jHl). This introduces 
a 6 dependence to Kg and has the potential to change the shape of the isotherm from 
a hyperbolic formj27j. However, it cannot be the explanation for differing PM/MM 
saturation intensities, as the adjusted form of Eq. (jH)) still satisfies ^ ^ 1 as x oo. 
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5.4- Competitive hybridization with probe-probe pairs and probe self-interactions 

Forman et al. ^0] advanced the hypothesis that the observed divergence of saturation 
intensities between PM and MM features is caused by hybridization of neighbouring 
probe-probe pairs, rendering a certain fraction of each feature unavailable for target 
binding to form probe-target duplexes. Probe-probe interactions are possible if we 
assume probes of approximate length 8 nm and bound by flexible linker molecules to 
a glass substrate to have an average interprobe distance of the order of lOnm |24j . 
especially given that some clustering of probes is to be expected. It has also been 
recognised IS] that self-interaction of probes via probe folding may render a fraction of 
probes unavailable for hybridization and affect adsorption isotherms. 

In Appendix C we discuss how hybridizaton in the presence of probe-probe and 
probe self interactions may be modelled. In agreement with ref.jnj we find that probe 
self-interactions have the effect of scaling the equilibrium constant Kg for the adsorption 
process. However, this cannot be the explanation for differing PM/MM saturation 
intensities as it does not change the saturation asymptote. The probe-probe interactions, 
on the other hand, are more complex, and we show in appendix C that one is naturally 
led to the random lattice version of a two dimensional statistical mechanics model 
known as the monomer-dimer model. No solution to this model exists even for the more 
tractable cases of regular lattices, though some numerical work has been done for the 
regular square lattice monomer-dimer model|2]. 

In Appendix C we tackle the unphysical but analytically tractable one dimensional 
model of competitive hybridization with probe-target and probe-probe duplexes. We see 
that a probe-probe binding energy of 1 or 2 kcal mol~^ is enough to make a noticeable 
difference to the adsorption isotherm in this approximation (see Fig. El). The one 
dimensional model saturates at 100% coverage of probe-target duplexes at high target 
concentration and so is unable to explain the divergence of PM and MM saturation 
intensities. However it is well known that the behaviour of statistical mechanics models 
in one and two dimensions can be very different. It is known, for instance, that a one 
dimensional model with local interactions cannot lead to a phase transition, whereas 
a number of two dimensional models are known to exhibit phase transitions at critical 
temperatures or densities|3]. 

The evidence from numerical calculations of the monomer-dimer model on a 
regular square lattice is that it does not have a phase transition for non-zero monomer 
density |l2j, but we are unaware of any numerical simulations for the random lattice 
case more relevant to our problem. It is therefore still possible that the microarray 
surface configuration could undergo a phase transition from a disordered phase with low 
concentration of probe-probe duplexes to an ordered phase in which a high concentration 
of probe-probe duplexes line up along a particular direction. This could explain 
the differing intensity measurement curves of MM features observed before and after 
quenching in the experiments of Peterson et al. [^Hl- Whether the Forman hypothesis 
can explain the observed difference in PM/MM saturation intensities, however, remains 
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Figure 2. Plots of the coverage fraction 9 of probe-target duplexes against the 
dimensionless target concentration x/Kg given by the solution Eqs. (|C.lip and ljC.12|l 
to the one-dimensional model described in Appendix C, for various values of the 
effective probe-probe duplex equilibrium constant Kp — Kp / {1 + Kq)"^ . Probe duplex 
or probe self interaction free energies of AGp or AGq = 0, —1 and —2 kcal mol^^ at 
45°C correspond to Kp or Kq values of 1, 4.9 and 23.7 respectively. 



an open question, though any such function is unhkely to be consistent with the observed 
hyperbohc response function. 

5.5. Competitive bulk hybridization 

By competitive bulk hybridization we mean the hybridization of specific target molecules 
T in solution either with (i) other specific target molecules T' which might happen 
to be, at least in part, self complementary (T -|- T ^ T.T), (ii) non-specific target 
molecules which happen to have approximately complementary nucleotide sequences 
(T -|- T' ^ T.T'), or (iii) target self-interactions (T ^ Tfoided)- Halperin et al.[TT] 
have considered the effect on equilibrium isotherms of the first two types of bulk 
hybridization, and type (iii) can be dealt with in a similar way. Assuming that probe- 
target hybridization has a negligible effect on bulk target concentrations, they argue 
that equilibrium isotherms can be obtained from isotherms such as Eq. (jUI) by replacing 
the spike-in target concentration x with the single strand concentration [T] obtained by 
applying the law of mass-action to the bulk hybridization reaction in solution. 
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For all three types of hybridization, we argue here that competitive bulk 
hybridization cannot explain differential PM/MM saturation. In each case, application 
of the law of mass action entails that [T] — ^ oo as x — cx), so Eq. (jH)) with x replaced 
by [T] still implies 100% saturation of features in the high spike-in concentration limit 
for both PM and MM features. 

Furthermore, we can rule out any significant effect on the isotherm from T.T 
hybridization for the probe sequences studied in the Affymetrix spike-in experiment by 
the following argument. The law of mass action implies that the behaviour of the single 
strand concentration goes from [T] x at spike-in concentrations x « K^^^k (where 
-f^buik is the equilibrium constant for the reaction T + T' ^ T.T') to [T] ^ (a;/2frbuik)^''^ 
at spike-in concentrations x » -K^buik- A significant effect from T.T hybridization would 
therefore lead to a Sips isotherm with parameter 7 = | at high spike-in concentration, 
which, by the analysis of Appendix A, is not observed over the range of concentrations 
in the Affymetrix spike- in experiment. 



6. The washing step 

The hybridization step is followed by a washing step designed to remove unbound target 
molecules before scanning the microarray. During the washing step the target solution is 
flushed out of the cartridge containing the microarray and replaced by a washing buffer 
containing no RNA. Thus the ambient concentration of target molecules is set to zero, 
switching off the forward adsorption reaction. We argue here that the washing step 
is responsible for the measured differences between PM/MM intensity measurements 
at saturation concentrations. This idea has been proposed briefly by ZhangjSH], but 
requires further analysis. 

Let us assume that, immediately prior to washing, duplex coverage fractions on 
a given feature are given by the equilibrium model set out in Section El That is, 
the fraction 6 of sites on a feature occupied by specific probe-target duplexes and the 
fraction 0j covered by non-specific duplexes of species i are given by Eqs. (jHI) and ((Zj). 
During the washing process some of the duplexes will be dissociated. Suppose that 
the probability that a given probe-target duplex has survived up to a washing time tw 
is s{tw) for a specific duplex and Siitw) for a non-specific duplex of species i. The 
survival functions s and Si depend only on probe and target base sequences and not 
the ambient target concentrations x and Zi present during the prior hybridization step. 
They satisfy s(0) = 1 and are monotonically decreasing. The specific and non-specific 
duplex coverage fractions at time tw are then 

Q( + \ s{tw)x/Ks , . 

1 + x/Ks + Y.iZi/Ki 

(j)i{x,tw) = /T^ I TTT- 17 

1 + x/Ks + Y.jZj/Kj 

Repeating the assumption used in Section|21that the measured fluorescence intensity 
is a linear function of the duplex coverage fractions, that is y{x, tw) = a + bs9{x, tw) + 
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X]j tiy); we find that at fixed tw the hyperboUc form required by points (ii) and 

(iii) in Section |2l namely 

X 

y{x,tw) = yoitw) + h{tw) — —p, (18) 

X + K 

is maintained, and that the 'observed' parameters yo, ^ and K are now given by 

yo{tw) =a + A{tw), b{tw) = s{tw)bs - A{tw), K = K^B, (19) 



where 



A(%) = ^E^%^, 5 = i + Ef- (20) 



Note that the parameter K is unaffected by the length of the washing process, and 
depends only on duplex binding free energies via the hybridization step. The asymptotic 
fluorescence intensity at high target concentration, 

?/(oo, tw) = Voitw) + K^w) = a + s{tw)bs, (21) 

is depressed by the presence of the survival fraction s{t]y)- 

To model the survival function s(tvy), one expects the rate of dissociation of specific 
probe-target duplexes to be the product of the fraction 9(tw) of probes forming specific 
duplexes and a washing rate k which depends only on the probe and target nucleotide 
sequences. Assuming then that k is independent of tw, the survival function is 

s{tw) = e""*'^. (22) 

Since the binding affinity of a PM-specific target to a MM probe is less than to a PM 
probe, we expect in general that > k^^, or equivalently, s^^{tw) < s^^{tw) and 
hence y^^{oo) < y^^(oo) as required. 

Ideally one would like to test directly the veracity of the survival function Eq.(j22I) 
using data from a range of washing times. While this is not possible with spike-in data 
corresponding to a single value of tw, we can at least check for qualitative agreement of 
the above scenario with probe sequence information. 

From Eqs. p9|) and (j221) one obtains 

Ktw = log 6s - log [Voitw] + b{tw) - a] . (23) 

For fixed tw, the left hand side is a measure of the rate at which probe-target duplexes 
dissociate due to washing, and should increase with decreasing binding affinity. The 
right hand side depends on the fitted isotherm parameters yoitw) and b(tw), and two 
unknown parameters: a, the physical background, and 6s, the fluorescence intensity 
above background of a feature fully saturated with PM-specific probe-target duplexes. 
In order to make comparisons across the fitted spike-in data, we will take the two 
unknown parameters to be constant across all features of the microarray. While this 
may seem to be a radical assumption for bs, we argue that, because the target mRNA 
is fractionated randomly to lengths of between 50 and 200 bases, the distribution of 
the number of U and C bases carrying biotin labels on PM-specific targets will not be 
strongly influenced by the relatively short 25-base subsequence of the probe. The total 
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number of bases carrying labels on a saturated feature could therefore be replaced by a 
typical representative value independent of the feature. 

In Fig. 01 we plot the right hand side of Eq. ()23|1 against RNA/DNA duplex free 
binding energy in bulk solution calculated using the nearest neighbour stacking model 
and parameters of ref. [SEj- Values of yo and h are from fits of the PM data to hyperbolic 
isotherms as described in Section |21 The value a = 50 was chosen to be slightly less 
than the lowest intensity value from the entire data set, though in practice any positive 
value up to 100 gives an almost identical plot. The choice of parameter 6s = 31100 only 
affects the vertical offset, and has been determined by setting log 6s = where a is 
the intercept in a linear regression to an empirical function (also plotted in FiglHI) 

- log [y,{tw) + h{tw) -a] = a + /^e'^^/^^^^), (24) 

in which A = 11.1 has been chosen to minimize the residual standard error. The linear 
regression gives a = 10.3 and /? = 55.4. We see that the data is consistent with a rate 
of duplex removal during washing that decreases exponentially to zero with increasing 
binding energy — AC The factor A reflects the fact that effective duplex binding energies 
at the microarrray surface are considerably less than bulk solution binding energies due 
to effects such as electrostatic blocking [18j (see also Subsection 15. 3|) and a consequent 
enhancement of partial zippering (see Eq. (jH}). 

In Fig. m we examine the dependence of the estimated washing rate Eq. ()23|) on the 
nucleotide composition of probe sequences. The upper four bar charts show estimated 
PM (MM) washing rates averaged over sequences with a particular base at the zth 
position {i = 1,...,25) minus estimated washing rates averaged over all PM (MM) 
probes. As expected, washing rates are generally lower than average for strong hydrogen 
bonded bases C and G occurring in the DNA probe sequences and higher than average 
for A and T. This is the case for both PM and MM probes. Interestingly, with the 
exception of the mismatched central MM base, there seems to be no obvious relationship 
between the strength of the effect and position along the probe. 

The remaining two bar charts show the analogous contrasts for the difference 
)tw The estimate of this quantity, determined from Eq. (|^. is independent 
of 6s, and so conclusions drawn from from this bar chart do not rely on the assumption 
that 6s is uniform from one feature to another. Here the effect of the mismatched base 
at position 13 is quite noticeable: removing a triple hydrogen bond (C=G) raises the 
washing rate more than removing a double hydrogen bond (A=U) or (T=A). Conversely, 
the effect of a central mismatch on the washing rate is almost always greater when any 
of the remaining 24 bases is a weakly bound A or T than a strongly bound C or G. This 
is entirely in keeping with the washing scenario. 

7. Summary and Conclusions 

An understanding of the physical processes driving hybridization is essential if the 
design of expression measures is to advance to a point where target concentration 
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Figure 3. Plots of the estimate of tiie wasliing rate k (times the washing time 
tw, which is constant across all proes) given by Eq. using Langmuir isotherm 
parameter fits for the PM probesets of the spike- in experiment described in Section |21 
AG is the RNA/DNA duplex free binding energy in bulk solution calculated using the 
nearest neighbour stacking model of rcf. 26 . The solid curve is the exponential fit 
Eq. H24|) with parameter values given in the text. 

can be measured in absolute terms. The aim of this paper has been to gain an 
improved understanding of the physics of ohgonucleotide microarrays by exploiting the 
observed differences in the responses of PM and MM features to known cRNA target 
concentrations. The starting point of this paper is an adsorption model of hybridization 
at the surface of oligonucleotide microarrays based on models proposed independently by 
Hekstra et al. |T2] and Halperin et al. Qjy. Though arrived at from different approaches 
the Hekstra and Halperin models are essentially equivalent, and are an improvement 
on their predecessors in that they allow for the presence of cross-hybridization from 
non-specific targets. 

We have mainly concentrated on seeking to explain the commonly observed 
difference between fluorescence intensity measurements from a neighbouring PM/MM 
pair of features at high specific target concentration. That is, if a sufficiently high 
concentration of PM-specific RNA target is spiked in to the target solution, both 
the PM and MM fluorescence intensity signals will reach an asymptote, but the MM 
asymptote is almost invariably observed to be lower than the PM asymptote. Our 
starting Hekstra/Halperin model incorrectly predicts 100% coverage with PM-specific 
duplexes of both PM and MM features under these conditions, which in turn incorrectly 
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Figure 4. Barcharts of estimates of n tw (top), k ty/ (middle) and (k 



)tw (bottom) from Eq. averaged over DNA probe sequences with base A, T, 
(black, grey respectively, left hand plots), C or G (black, grey respectively, right hand 
plots) at each position along the PM probe sequence minus the corresponding averages 
over all probe sequences. 



implies that the asymptotic PM and MM fluorescence signals will be equal. 

We have sought to resolve this discrepancy firstly by taking a more detailed look 
at the hybridisation step, and secondly by examining the subsequent washing step. In 
general, we find that more detailed variants of our starting model of the hybridisation 
step, many of which have been independently suggested or alluded to previously, are 
unable to resolve the problem. Given our previous analysis of data from the Affymetrix 
Latin Square spike-in experiment [Hj, we are able to dismiss the Sips isotherm and non- 
equilibrium models of hybridization including multi-step models which take into account 
a slow initiation step followed by a rapid zipping up. We are also able to dismiss 
the effects of electrostatic screening at the microarray surface and bulk target-target 
hybridization as a possible explanation of differential PM/MM intensity measurements 
at saturation. 

We are as yet unable to dismiss entirely the possibility that competitive 
hybridization from probe-probe duplexes at the microarray surface renders a fraction 
of DNA probes unavailable to target molecules, as suggested by Forman et al. [T^ . To 
make progress with this problem, one needs to carry out a numerical simulation of a 
dimer-like statistical mechanics model on a two dimensional random lattice, probably by 
Monte Carlo methods. Analysis of the equivalent one dimensional model suggests that 
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this form of competitive hybridization could well have a measurable quantitative effect 
on the equilibrium adsorption isotherm, though the two dimensional case is unlikely to 
lead to the observed hyperbolic response curve. 

By comparison, we find that the post-hybridisation washing step is able to provide 
a promising and straightforward explanation for the PM/MM difference at saturation. 
We have considered a scenario in which the equilibrium state predicted by our starting 
Hekstra/Halperin model is attained by the end of the hybridization step, following which 
the washing phase dissociates a fraction of bound duplexes. The portion of both the PM 
and MM signals above background decays exponentially during the washing phase, but 
since the MM binding affinity is less than that for PM features, the decay rate is faster 
for MM features. The results of our analysis of the dependence of inferred washing rates 
on probe base sequences support this scenario. The advantages of this model are that it 
preserves the observed hyperbolic shape of the Langmuir isotherm, and that it explains 
both the partial (i.e. < 100%) coverage of each feature by duplexes at saturation spike- 
in concentrations and the fact that the MM feature almost invariably asymptotes to a 
lower measured fluorescence intensity than its PM partner. 

The analysis presented in this paper argues that the solution to providing a 
practical method of estimating absolute concentration of target mRNA from microarray 
data lies in understanding the physics of hybridization and washing at the microarray 
surface. Ideally one would like to be able to estimate isotherm parameters from probe 
sequence information and physical parameters including microarray design parameters, 
hybridization temperatures and washing times. It is hoped that theoretical analysis can 
serve as a guide to the design of experimental work. In particular, the results set out 
in this paper illustrate a strong need for further spike-in experiments carried out with 
varying washing times or continuous monitoring of fluorescence intensities during the 
washing step. 
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Appendix A. Statistical comparison of Langmuir and Sips isotherms 

In this appendix we carry out a statistical analysis of fits to the Langmuir isotherm, 
Eq. (j21), and the Sips isotherm 

y = yo + b — , (A.l) 

+ 

to determine which model is the better fit to the MM data of the Affymetrix spike-in 
experiment. The method used is described in detail in an earlier paper which compares 
fits of the PM data to a number of isotherm modelslH]. 
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Figure Al. Histogram of fitted values of the Sips parameter 7 for the MM data. 



The stochastic component of the fluorescence intensity y is assumed to be drawn 
from a Gamma distribution. The data is fitted using the generahzed hnear model 
formalism as defined in ref. ^H] , in which the negative log likelihood of the fit, or deviance, 
is minimized over the parameters Uq, b, K and, in the case of the Sips isotherm, also 
7. To compare fits to the Langmuir and Sips models with and rs residual degrees of 
freedom and deviances and Ds respectively, we use the scaled deviance 

AAcalcd = pL-/^5)7f. 

Note that > rs » 1- To evaluate the null hypothesis, 7 = 1, ADgcaied can be 
compared with a chi-squared distribution with Ar = — rs degrees of freedom[^. 

We were able to obtain fits with positive parameter values to both the Langmuir 
and Sips isotherms for about 80% of the probes. For most of the remaining cases the 
MM response was too small to provide a useful fit. Results for the scaled deviance are 
shown in Table lAll The total deviance of 133.8 lies at the 13th percentile of a chi- 
squared distribution with 153 degrees of freedom, showing no reason to consider a more 
complex model that the Langmuir isotherm. Finally, a histogram of the fitted values of 
the Sips parameter. Fig. lAll shows that the Sips parameter is symmetrically distributed 
about 7 = 1, as expected if the Langmuir isotherm is the more accurate model. 

Appendix B. Quasi-equilibrium model with nucleation 

We consider the hybridization model illustrated in Fig. IBll in which the forward, 
duplex forming, reaction involves two steps: a slow rate determining step in which 
the first two or three base pairs form, following a fast zipping-up step in which the 
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Table Al. Comparisons of fits to Langmuir and Sips isotherms. Ar is the decrease 
in residual degrees of freedom for each gene and ADgcaicd is the corresponding scaled 
decrease in deviance from Eq. ljA.2p . 
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Figure Bl. Hybridization proceeding from probe plus target (P + T) to partially 
formed duplex in which two or three bases pair (P.T*) to a zipped-up duplex (P.T). 
fci, ^2 and fc_2 are chemical reaction rates. 

remaining base pairs form. The probe and target molecules are denoted by P and T 
respectively, the partially formed duplex after the rate determining step by P.T*, and 
the completed target-probe duplex by P.T. For simplicity we consider the case without 
cross- hybridization. 

Let the target concentration be x, the fraction of probes in a feature which have 
formed a fully zipped up duplex P.T be 6' and the fraction which have formed an initiated 
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duplex P.T* be C,. The remaining fraction of free single strand probes is defined as 
X = ^ — — C^. The chemical rate equations are 

^= -k,xx + k.,C (B.l) 
df) 

_ = fcaC - k^2e. (B.2) 

The reaction rates kix and k_2 are assumed to be slow (on the order of hours) and the 
rates k_i and fast. Accordingly we define 

ki = tKi, k-2 = eK-2, C = (B.3) 

where e << 1. This gives 

^ = - ^^ixx + k^iC (B.4) 
d6 

— = fcaC - ^-20, (B.5) 

where r = et is 0(1) on timescales of the slow nucleation reactions. We solve these 
equations to zeroth order in e, subject to the constraints 9 + x = 1 + 0(f) and 
dO/dr = —dx/dr + 0(e). Eliminating ( and x with the help of the constraints gives 

d9 Kik2X Kik2X + K-2k-l „ , „ , . ,„ 

T = ~i n 'i n d + 0{e). (B.6) 

dr A;_i + k2 k-i + K2 

The solution to zeroth order, with initial condition 6{0) = 0, is 

k\k2X -\- k—ik—2 

after reinstating the original variables. This is of the form Eq. (fT3j) where Ks = 
k^ik^2/{kik2) and fcf = kik2/{k^i + k2). 

Appendix C. Equilibrium model with competition between probe-target 
and probe-probe duplexes 

We consider here the equilibrium thermodynamics of the microarray surface when 
pairwise interactions between neighbouring probes and self interaction of individual 
probes are taken into account. The formation of probe-probe duplexes or folded probes 
will render a fraction of the probes unavailable for RNA target hybridization. 

For a given feature, define M to be the total number of probe sites on that feature, 
to be the number of probe-target duplexes, P to be the number of probe-probe duplexes 
and Q to be the number of self interacting (i.e. folded) probes. In this appendix we 
will for simplicity ignore hybridization of non-specific targets and partial zippering. The 
number of configurations consistent with the above partitioning is 
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where z/(P, M) is the number of ways of forming P neighbouring pair duplexes on an 
array of M sites, where < 2P < M. The contribution to the canonical partition 
function from the entire feature is 

g-M7/fcBT _ ^ 2) 

exp [fil,N + /i°pP + f^lQ + filiM -N-2P-Q) 

where 7 is the free energy per site, /ip^, /ipp, /tq and /tp are reference state chemical 
potentials per site of a probe-target duplex, probe-probe duplex, self interacting probe 
and unmatched probe respectively, and fee is Boltzmann's constant. 

For illustrative purposes we begin with an analysis of the relatively easily solved 
one dimensional model. For a one dimensional lattice in which nearest neighbour sites 
may form duplexes, one easily obtains z/(P, M) = (M — P)!/ [P!(M — 2P)!], and hence 

Appling the Stirling approximation log A^! = A^ln — + 0(ln A^) and setting 

A^ 

6 = — = fraction of feature covered by P-T duplexes 

M 

2P 

C = — = fraction of feature covered by P-P duplexes 

M 

Q 

f = — = fraction of feature covered self interacting probes 
R 

7 = — 7 = surface free energy per mole of probe sites 

fcB 

gives, in the bulk limit M —> 00, 

-f = RT [-(1 - 10 ln(l - K) + Kl^K + ein^ + ^In^ 
+ (l-e-C-Oln(l-^^-C-0] 

where /Xp^, /ipp, /Xq and /Xp are reference state chemical potentials per mole and R is the 
gas constant. 

The equilibrium isotherm is obtained by balancing exchange chemical potentials for 
P-T duplexes with the chemical potential of the target species in solution and setting 
the chemical potentials for P-P duplexes and self interacting probes to zero, that is 

de=^^^ 5C=°' 9^=°' ^^-'^ 

where fit is given by Eq. (jlj). This leads to 

e = {i-C-0^j^, (C.6) 

x + Ks 

K(i-K) = ^p(i-^-C-0', (C.7) 

and 



1 + 



(C.8) 
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where the equihbrium constants for the P-T duplex forming reaction, K-p for the 
P-P duplex forming reaction and Kq for the self interaction are 

i^s = xoe^^/«^, Kp = e-^«p/«^, Kq = e'^^Q/^^, (C.9) 

where 

AG = - /ij - AGp = /xjp - AGq = /ij-zij. (C.IO) 



Eliminating ^ gives 

X 

' x + K 



ic(i-K) = ^p(i-^-c)', (C.12) 

where K'^ = {1 + Kq)Ks and = Kp/{1 + K^f. That is, the effect of probe self 
interaction is to rescale the remaining equilibrium constants. 

From Eq. (ini2ll one finds that the P-P coverage fraction decreases smoothly from 

a maximum value Cmax = 1 ~ 2 (-^p i) at 6' = to zero at 6* = 1. This has two 
consequences. Firstly, there is no phase transition, as expected for a one dimensional 
model with local interactions. Secondly, we see from Eq. ()C.11|) that 6 asymptotes to 
1 in the limit of high target concentration x — * 00. Thus the simple one dimensional 
model of P-P duplexes is unable to explain partial saturation of the feature at high 
concentration. A plot of 9 against target concentration for a range of values of K'p is 
given in Fig. |21 

Ideally we need to solve the model defined by Eq. ()C.2|) for a random two 
dimensional lattice. The presence of self interactions of individual probes involves no 
interaction between sites and consequently cannot complicate the phase structure. In 
fact, by comparing Eqs. (jH)) and ()(y.4jl we see that, for the purposes of determining phase 
structure, probe self interactions and non-specific hybridization are mathematically 
identical problems. Nearest neighbour probe-probe interactions, on the other hand, are 
less tractible. In this case one needs to calculate i/(P, M) for a random two dimensional 
lattice with some reasonable definition of 'neighbouring'. To analyze the bulk limit, 
it can be shown that one only needs (1/M) log z/(P, M) in the limit M, P ^ 00 for 
given fixed 2P/M. This is the random lattice analogue of the monomer-dimer model 
which is usually defined on a regular two dimensional lattice, and for which no exact 
solution has been found. For a square lattice, though, numerical calculations strongly 
suggest the model has no phase transition at non-zero monomer density [2j. (At zero 
monomer density, that is 2P = M, the square lattice monomer-dimer model is critical, 
corresponding to the critical point of the Ising model [T^.) 

A review of most of the two dimensional statistical models which have been solved 
exactly can be found in ref . |3| . These include the close packed dimer model on a square 
lattice, which is equivalent to calculating i/(iM, M), and the hard hexagon model, in 
which sites of a triangular lattice are occupied subject to the constraint that no two 
neighbouring sites may be occupied simultaneously. The model we are interested in is 
similar in some ways to the hard hexagon model, except that in our case links of a lattice 
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are occupied subject to the constraint that no two adjoining hnks may be simultaneously 

occupied. The hard hexagon model does undergo a phase transition between a liquid 

phase (uncorrelated positioning of hexagons at low density) and a solid phase (close 

packing of hexagons centred on one of three possible sublattices) . Whether the random 

lattice duplex model relevant to the case in hand undergoes a phase transition from a 

disordered phase at low duplex density or high temperature to an ordered phase at high 

duplex density or low temperature is unknown. 
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